{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<a href=\"https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv\" target=\"_blank\"><img align=\"left\" src=\"data/cover.jpg\" style=\"width: 76px; height: 100px; background: white; padding: 1px; border: 1px solid black; margin-right:10px;\"></a>\n",
    "*This notebook contains an excerpt from the book [Machine Learning for OpenCV](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv) by Michael Beyeler.\n",
    "The code is released under the [MIT license](https://opensource.org/licenses/MIT),\n",
    "and is available on [GitHub](https://github.com/mbeyeler/opencv-machine-learning).*\n",
    "\n",
    "*Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations.\n",
    "If you find this content useful, please consider supporting the work by\n",
    "[buying the book](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv)!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Selecting the Right Model with Hyper-Parameter Tuning](11.00-Selecting-the-Right-Model-with-Hyper-Parameter-Tuning.ipynb) | [Contents](../README.md) | [Understanding Cross-Validation](11.02-Understanding-Cross-Validation-Bootstrapping-and-McNemar's-Test.ipynb) >"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating a Model\n",
    "\n",
    "Model evaluation strategies come in many different forms and shapes. In the following\n",
    "sections, we will, therefore, highlight three of the most commonly used techniques to\n",
    "compare models against each other:\n",
    "- $k$-fold cross-validation\n",
    "- bootstrapping\n",
    "- McNemar's test\n",
    "\n",
    "In principle, model evaluation is simple: after training a model on some data, we can\n",
    "estimate its effectiveness by comparing model predictions to some ground truth values. We\n",
    "learned early on that we should split the data into a training and a test set, and we tried to\n",
    "follow this instruction whenever possible. But why exactly did we do that again?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluating a model the wrong way\n",
    "\n",
    "The reason we never evaluate a model on the training set is that, in principle, any dataset\n",
    "can be learned if we throw a strong enough model at it.\n",
    "\n",
    "A quick demonstration of this can be given with help of the Iris dataset, which we talked\n",
    "about extensively in [Chapter 3](03.00-First-Steps-in-Supervised-Learning.ipynb), *First Steps in Supervised Learning*. There the goal was to\n",
    "classify species of Iris flowers based on their physical dimensions. We can load the Iris\n",
    "dataset using scikit-learn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_iris\n",
    "iris = load_iris()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An innocent approach to this problem would be to store all data points in matrix `X` and all\n",
    "class labels in the vector `y`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "X = iris.data.astype(np.float32)\n",
    "y = iris.target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we choose a model and its hyperparameters. For example, let's use the $k$-NN\n",
    "algorithm from [Chapter 3](03.00-First-Steps-in-Supervised-Learning.ipynb), *First Steps in Supervised Learning*, which provides only a single\n",
    "hyperparameter: the number of neighbors, $k$. With $k=1$, we get a very simple model that\n",
    "classifies the label of an unknown point as belonging to the same class as its closest\n",
    "neighbor.\n",
    "\n",
    "In OpenCV, $k$-NN instantiates as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import cv2\n",
    "knn = cv2.ml.KNearest_create()\n",
    "knn.setDefaultK(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we train the model and use it to predict labels for the data that we already know:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "knn.train(X, cv2.ml.ROW_SAMPLE, y)\n",
    "_, y_hat = knn.predict(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we compute the fraction of correctly labeled points:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "accuracy_score(y, y_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see an accuracy score of 1.0, which indicates that 100% of points were correctly labeled\n",
    "by our model!\n",
    "\n",
    "But is this truly measuring the expected accuracy? Have we really come up with a model\n",
    "that we expect to be correct 100% of the time?\n",
    "\n",
    "As you may have gathered, the answer is no. This example shows that even a simple\n",
    "algorithm is capable of memorizing a real-world dataset. Imagine how easy this task would\n",
    "have been for a deep neural network! Usually, the more parameters a model has, the more\n",
    "powerful it is. We will come back to this shortly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluating a model the right way\n",
    "\n",
    "A better sense of a model's performance can be found using what's known as a test set, but\n",
    "you already knew this. When presented with data held out from the training procedure, we\n",
    "can check whether a model has learned some dependencies in the data that hold across the\n",
    "board or whether it just memorized the training set.\n",
    "\n",
    "We can split the data into training and test sets using the familiar `train_test_split` from\n",
    "scikit-learn's `model_selection` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But how do we choose the right train-test ratio? Is there even such a thing as a right ratio?\n",
    "Or is this considered another hyperparameter of the model?\n",
    "\n",
    "There are two competing concerns here:\n",
    "- If our training set is too small, our model might not be able to extract the relevant data dependencies. As a result, our model performance might differ significantly from run to run, that is, if we repeat our experiment multiple times with different random number seeds. As an extreme example, consider a training set with a single data point from the Iris dataset. In this case, there would be no way for the model to even learn that there are multiple species in the dataset!\n",
    "- If our test set is too small, our performance metric might differ significantly from run to run. As a result, we would have to rerun our experiment multiple times to get an idea of how well our model does on average. As an extreme example, consider a test set with a single data point. Since there are three different classes in the Iris dataset, we might get either 0, 33%, 66%, or 100% correct.\n",
    "\n",
    "A good starting point is usually a 80-20 training-test split. However, it all depends on the\n",
    "amount of data available. For relatively small datasets, a 50-50 split might be more suitable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=37,\n",
    "                                                    train_size=0.8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we retrain the preceding model on the training set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "knn = cv2.ml.KNearest_create()\n",
    "knn.setDefaultK(1)\n",
    "knn.train(X_train, cv2.ml.ROW_SAMPLE, y_train);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we test the model on the test set, we suddenly get a different result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.96666666666666667"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "_, y_test_hat = knn.predict(X_test)\n",
    "accuracy_score(y_test, y_test_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see a more reasonable result here, although 97% accuracy is still a formidable result. But\n",
    "is this the best possible result—and how can we know for sure?\n",
    "\n",
    "To answer this question is, we have to dig a little deeper."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Selecting the Right Model with Hyper-Parameter Tuning](11.00-Selecting-the-Right-Model-with-Hyper-Parameter-Tuning.ipynb) | [Contents](../README.md) | [Understanding Cross-Validation](11.02-Understanding-Cross-Validation-Bootstrapping-and-McNemar's-Test.ipynb) >"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
